1 Various Approaches in Text Pre - processing

نویسنده

  • Yanbo Wang
چکیده

Text mining, as an increasingly important field of research in Knowledge Discovery in Data (KDD), concentrates on discovering hidden patterns, rules, regularities and trends from textual data, such as natural language speech or web documents. The structure of textual data is considered implicit, which is different from the structured data that stored in databases. The various natures of textual data and the data in databases cause the major difference between text mining and the traditional data mining (the main stage of KDD). However, it is doable to obtain the knowledge from texts by employing the techniques in data mining if we can rationally explicate the textual data to be structured or semi-structured. In this paper, we summarize the various ways of text preprocessing. Our work is presented with the aim of supporting future work in text mining research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel Approaches to Pre-processing Documentbase in Text Classification

Text classification is currently popular in Knowledge Discovery in Databases (KDD) and Machine Learning (ML). KDD based text classification research focuses on statistical techniques, while the ML based approach focuses on artificial-intelligence techniques. Text mining necessitates the pre-processing of the documentbase. Two broad approaches can be identified: (1) document representation and (...

متن کامل

CSCR010: Second Year Report

The aim of my PhD research is focused on Text Mining, one major research school in Knowledge Discovery in Databases (KDD), and in particular Text Preprocessing (TPP) for classification / categorization of documents utilizing novel algorithms for the identification of hidden patterns, rules, regularities and trends within these documents. Significant techniques in Data Mining, another wellknown ...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Text Analytics of Customers on Twitter: Brand Sentiments in Customer Support

Brand community interactions and online customer support have become major platforms of brand sentiment strengthening and loyalty creation. Rapid brand responses to each customer request though inbound tweets in twitter and taking proper actions to cover the needs of customers are the key elements of positive brand sentiment creation and product or service initiative management in the realm of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004